feat(temporal): anti-drift injection for all provider requests#2574
feat(temporal): anti-drift injection for all provider requests#2574dwillitzer wants to merge 1 commit intorouter-for-me:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces model health monitoring, OpenRouter metadata enrichment, and a temporal injection feature that adds current date and time context to LLM requests. It also expands model metadata in API responses and updates routing logic for Claude-related User-Agents. Feedback focuses on thread-safety issues in the new temporal package, specifically recommending the use of atomic operations for the global request counter to prevent race conditions.
| } | ||
|
|
||
| // requestCounter tracks how many requests have been seen for interval-based injection. | ||
| var requestCounter uint64 |
| if cfg.InjectInterval <= 0 { | ||
| return true | ||
| } | ||
| requestCounter++ |
| return true | ||
| } | ||
| requestCounter++ | ||
| return requestCounter%uint64(cfg.InjectInterval) == 0 |
There was a problem hiding this comment.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 71a6b44de3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| system.ForEach(func(_, v gjson.Result) bool { | ||
| newSystem = append(newSystem, v.Value()) |
There was a problem hiding this comment.
Preserve scalar system prompts during Claude injection
This rewrite path is entered whenever a top-level system key exists, but injectIntoClaudeSystem only preserves existing entries by iterating system.ForEach and never normalizes scalar system values. The repo still supports string system prompts elsewhere (e.g. Claude executor handles system strings), so payloads like {"system":"...","messages":[...]} can have their original system instruction dropped or reshaped when temporal injection runs, changing model behavior.
Useful? React with 👍 / 👎.
| var newMessages []any | ||
| newMessages = append(newMessages, temporalMsg) |
There was a problem hiding this comment.
Avoid prepending temporal message before Imagen prompt parse
Prepending the temporal message at messages[0] breaks the Imagen compatibility path that derives prompt text from the first non-empty messages.#.content entry (internal/runtime/executor/gemini_vertex_executor.go loop at lines 125-130). After this change, messages-format Imagen requests can use the injected temporal block as the prompt instead of the user’s prompt, producing incorrect image generations.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 115cb5f0ab
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // OpenAI format: messages array exists | ||
| if messages := gjson.GetBytes(payload, "messages"); messages.IsArray() { | ||
| return injectIntoMessages(payload, messages, text) | ||
| } | ||
|
|
||
| return payload |
There was a problem hiding this comment.
Inject temporal tags for Gemini-style payloads
InjectIntoPayload only handles top-level system and messages, then returns the payload unchanged for other schemas. Native Gemini/Vertex requests use contents payloads, so this path skips temporal injection entirely and anti-drift is not applied on those provider routes, despite this change advertising provider-wide request injection.
Useful? React with 👍 / 👎.
| globalRegistry := registry.GetGlobalRegistry() | ||
|
|
||
| // Get all available models with full metadata | ||
| models := globalRegistry.GetAvailableModels("openai") |
There was a problem hiding this comment.
Report health from all registered models
GetModelsHealth starts from GetAvailableModels("openai"), which excludes models whose clients are fully suspended/unavailable. That causes /models/health to omit exactly the models operators need to debug during outages, contradicting the handler’s stated goal of reporting all registered models.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 50ed89fcbd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| // Inject temporal context into the request payload | ||
| if temporal.ShouldInject(temporal.DefaultConfig()) { | ||
| req.Payload = temporal.InjectIntoPayload(req.Payload, req.Model) |
There was a problem hiding this comment.
Resolve model alias before deciding temporal injection
This injects temporal text based on req.Model before alias/model-pool resolution, so the image-model safeguard can be bypassed when users call an aliased route model (for example, img) that later resolves to an Imagen upstream model. In that case InjectIntoPayload sees a non-image alias and prepends a system message, then executeMixedOnce rewrites to the Imagen model (execReq.Model = upstreamModel), and Imagen prompt extraction can consume the injected block instead of the user prompt. Fresh evidence: alias remapping happens only after this line, so the prior Imagen fix does not apply to aliased image routes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: efb9735bfa
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| enhancedModels = append(enhancedModels, enhancedModel) | ||
| } | ||
| // Build the sources map indicating where context_length came from | ||
| sources := registry.BuildModelSources(globalRegistry) |
There was a problem hiding this comment.
Define newly referenced OpenRouter registry functions
This change introduces calls to registry.BuildModelSources, registry.GetOpenRouterLastRefresh, registry.TriggerOpenRouterRefresh, and registry.StartOpenRouterEnrichment, but no corresponding definitions are added in the internal/registry package. That leaves these references unresolved at build time, so server/management packages cannot compile until the symbols are implemented (or the calls are removed).
Useful? React with 👍 / 👎.
| // OpenAI format: messages array exists | ||
| if messages := gjson.GetBytes(payload, "messages"); messages.IsArray() { | ||
| return injectIntoMessages(payload, messages, text) |
There was a problem hiding this comment.
Inject temporal tags for OpenAI Responses payloads
InjectIntoPayload only injects when a top-level system or messages field is present, and otherwise returns the body unchanged. Requests sent through /v1/responses use the Responses schema (input/instructions) and are forwarded as-is, so temporal anti-drift is skipped for that endpoint family despite this feature being applied in the central conductor path.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e8ed002894
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if info := registry.GetModelInfo(modelID, ""); info != nil { | ||
| info.ContextLength = ctxLen | ||
| enrichedCount++ |
There was a problem hiding this comment.
Update registry entries instead of cloned model info
enrichModelsFromOpenRouter calls GetModelInfo and assigns info.ContextLength, but GetModelInfo returns a cloned ModelInfo, so this write is discarded and never updates r.models. That makes enrichment effectively a no-op even though enrichedCount is incremented and logs say models were enriched, leaving downstream model metadata unchanged.
Useful? React with 👍 / 👎.
| lastRefresh := registry.GetOpenRouterLastRefresh() | ||
| c.JSON(http.StatusOK, gin.H{ | ||
| "status": "refreshed", | ||
| "enriched_count": count, |
There was a problem hiding this comment.
Return true enriched count in refresh response
RefreshModels publishes enriched_count from TriggerOpenRouterRefresh, but that function returns the size of the OpenRouter cache (len(contextLength)), not how many registered models were enriched in this refresh. This can report misleadingly large counts (or wrong error semantics when cache is empty due fetch failure), so operators cannot trust the endpoint output.
Useful? React with 👍 / 👎.
luispater
left a comment
There was a problem hiding this comment.
Summary:
I found two blocking issues in this change set.
Key findings:
- Blocking: the temporal injector is enabled unconditionally via
temporal.DefaultConfig()in the execution path, so every proxied request now gets a hidden system prompt with no user opt-in and no configuration escape hatch. That changes request semantics for all providers and breaks API compatibility rather than preserving it. - Blocking:
internal/registry/openrouter_enrichment.goadds a newhttp.Client{Timeout: 30 * time.Second}/context.WithTimeoutfetch path. This repo explicitly restricts new timeout-driven network behavior to a small set of documented exceptions, and this enrichment path is not one of them.
Test plan:
- Reviewed the temporal injection flow and the new registry enrichment loop plus management endpoints.
luispater
left a comment
There was a problem hiding this comment.
Summary:
I like the goal, but this PR currently mixes the temporal injection feature with unrelated registry/management work and introduces a couple of blockers.
Blocking findings:
sdk/cliproxy/auth/conductor.gohard-codestemporal.DefaultConfig()at the injection call sites instead of wiring the feature to runtime config, which makes this always-on request mutation impossible to disable or roll back if it causes provider regressions.internal/registry/openrouter_enrichment.goadds a new long-lived background HTTP poller with its own timeout/refresh policy. That is a separate runtime/network behavior change from the temporal feature and should be reviewed independently instead of landing as part of this PR.
Test plan:
- Reviewed the temporal injection path, the new management endpoints, and the OpenRouter enrichment worker.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8fb9745510
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| func (m *Manager) temporalConfig() temporal.Config { | ||
| cfg, _ := m.runtimeConfig.Load().(*internalconfig.Config) | ||
| if cfg != nil { | ||
| return cfg.Temporal |
There was a problem hiding this comment.
Fall back to temporal defaults when config omits section
temporalConfig() returns cfg.Temporal whenever a runtime config is loaded, but Config.Temporal.Enabled defaults to false when the temporal block is absent in YAML. Because SetConfig always stores a non-nil config, ShouldInject sees Enabled=false and skips injection for all normal deployments unless users explicitly add temporal.enabled: true, which silently disables the feature this commit introduces.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 956ce7f040
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if temporal.ShouldInject(m.temporalConfig()) { | ||
| execReq.Payload = temporal.InjectIntoPayload(execReq.Payload, upstreamModel) |
There was a problem hiding this comment.
Apply temporal injection in CountTokens path
Temporal rewriting is added for Execute/ExecuteStream here, but executeCountMixedOnce still sends the original payload to CountTokens. Endpoints that preflight with token-count APIs (for example Claude/Gemini count routes via ExecuteCountWithAuthManager) will undercount because the injected temporal system block is missing there, so a request can pass counting and then fail real execution with token/context limits.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 33f0a789b3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if cfg != nil && cfg.Temporal != nil { | ||
| return *cfg.Temporal |
There was a problem hiding this comment.
Merge temporal defaults when config block is partially set
temporalConfig() returns *cfg.Temporal as-is whenever the block exists, so omitted YAML fields keep Go zero-values. A config like temporal: { inject_interval: 5 } leaves Enabled=false, causing ShouldInject to skip all requests even though defaults are documented as enabled-by-default; this silently disables the feature for partial configs instead of applying defaults plus overrides.
Useful? React with 👍 / 👎.
| for modelID, ctxLen := range openRouterContextLengths { | ||
| openRouterStore.contextLength[modelID] = ctxLen |
There was a problem hiding this comment.
Cache OpenRouter source using local model IDs
The enrichment cache stores keys from the OpenRouter response (openRouterContextLengths), but source lookup later is done with local registry IDs via GetOpenRouterContextLengthSource(modelID) in BuildModelSources. For non-exact matches (the same function explicitly supports cases like local gemini-3.1-pro matched from provider-prefixed IDs), source attribution will miss and /models/health reports provider/static even when context length came from OpenRouter.
Useful? React with 👍 / 👎.
| // Get all registered models and enrich those lacking context_length | ||
| allModels := registry.GetAvailableModels("openai") |
There was a problem hiding this comment.
Include unavailable models in enrichment pass
The enrichment loop uses GetAvailableModels("openai"), which excludes models whose clients are currently suspended/unavailable, but this function is intended to enrich registered models lacking context_length. During outages or quota suspension, those models are skipped entirely and remain unenriched until they become available again, undermining the management health metadata this change introduces.
Useful? React with 👍 / 👎.
Adds an optional temporal anti-drift system that injects a short
<temporal_context> XML element into the system prompt of outbound LLM
requests. Long-running agent sessions (>20 turns) frequently hallucinate
the current date; grounding every request in fresh wall-clock context
eliminates the drift.
Addresses the two blocking findings on the prior revision:
1. OPT-IN (was: unconditionally enabled)
- DefaultConfig() now returns {Enabled: false}. When the `temporal`
YAML block is omitted, behavior is identical to the non-temporal
build. No surprise request mutation.
- Enable via:
temporal:
enabled: true
inject_interval: 0
2. CONFIG WIRING (was: hardcoded temporal.DefaultConfig() at call sites)
- Config.Temporal is a *temporal.Config pointer, preserving the
nil-vs-explicit-zero distinction YAML can't otherwise express.
- Conductor reads the live runtime config on every request via
Manager.temporalConfig(); no reference to DefaultConfig() in the
hot path.
- ConfigOrDefault() helper exposes the same resolution logic to
downstream executors that need to re-apply injection.
Cross-cutting correctness:
- Claude OAuth (claude_executor.go): checkSystemInstructionsWithSigningMode
rebuilds the system array to mimic Claude Code's shape. It now
preserves any injected <temporal_context> block across the rewrite and
excludes it from the OAuth-sanitized forwarded user-system text, so
injection survives both the rewrite and the sanitizer.
- Antigravity (antigravity_executor.go): validateAntigravityRequestSignatures
followed by `req.Payload = originalPayload` discards any conductor
mutation. The executor now re-applies temporal.InjectIntoPayload after
validation at all three generation sites (Execute, streaming, retry
stream). CountTokens is intentionally left alone — it never reaches
generation.
- Other executors (OpenAI-compat, Codex, Gemini) don't overwrite
req.Payload, so the single conductor-level hook is sufficient.
Safety properties (covered by unit tests):
- No PII — only UTC and local timestamps.
- Format-aware — Claude (top-level `system`) vs OpenAI (`messages`).
- Dedup guard — payload already containing <temporal skips injection.
- Image-model skip — imagen / image-gen are never modified.
- Thread-safe — request counter uses sync/atomic.
Observability:
- Manager.SetConfig logs the effective temporal state on startup and on
every reload: `temporal injection state: enabled=%t interval=%d source=%s`
where source is `default` (YAML omitted) or `config` (explicit block).
Helps diagnose the class of reload-state regressions that are very
hard to catch otherwise.
Files:
- internal/temporal/temporal.go (+package, +doc)
- internal/temporal/temporal_test.go (6 unit tests)
- internal/config/config.go (Temporal *Config field)
- sdk/cliproxy/auth/conductor.go (hook + config resolution + log)
- internal/runtime/executor/claude_executor.go (OAuth preserve)
- internal/runtime/executor/claude_executor_test.go (regression test)
- internal/runtime/executor/antigravity_executor.go (re-inject x3)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1124bcd to
375e840
Compare
Adds a background enrichment job that populates `context_length` metadata for registered models by fetching OpenRouter's public models catalog (https://openrouter.ai/api/v1/models). Runs once on startup and then refreshes every 24 hours. ## Why Clients that surface model capabilities (e.g. "supports 1M context") currently rely on hand-maintained entries in internal/registry/models/ models.json. OpenRouter already publishes authoritative context_length values for every routed model; consuming that upstream eliminates the maintenance burden for any model OpenRouter fronts and keeps the catalog accurate as providers bump windows. ## Behavior - Startup fetch populates the in-memory enrichment store; if the fetch fails (network, upstream outage, etc.) the registry falls back to the static JSON with no service impact. - 24-hour periodic refresh keeps the cache fresh without hot-pathing the OpenRouter API. - Management endpoint exposes enriched metadata so the UI and clients can display accurate context windows. - model_registry.RefreshModels() now writes through to the live registration so the enriched count is observable and returned. ## Timeout pattern Uses `http.Client{Timeout: 30s}` + `context.WithTimeout` — structurally identical to the existing precedent at `internal/misc/antigravity_version.go` (commits 3774b56, 8d5e470), which was accepted upstream for the same class of periodic metadata fetch. Happy to tighten the timeout or rewrite the pattern if the maintainer prefers — just point at the approved shape. ## Scope This PR is intentionally scoped to OpenRouter enrichment only. The temporal-anti-drift feature that was previously bundled in the same branch is now split out to router-for-me#2574. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@luispater — force-pushed a full rewrite addressing both blocking findings. This PR is now temporal-only; the OpenRouter enrichment you flagged separately is split out to #2880. Blocker #1 — unconditional injection / no opt-in
Blocker #2 — OpenRouter enrichment
Scope
Also fixed under this scopeTwo injection-stripping paths that were silent in the prior revision:
VerifiedEnd-to-end against a running instance:
Ready for re-review when you have time. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 375e840514
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if temporal.ShouldInject(temporal.ConfigOrDefault(e.cfg.Temporal)) { | ||
| req.Payload = temporal.InjectIntoPayload(req.Payload, baseModel) | ||
| } |
There was a problem hiding this comment.
Avoid double-counting temporal interval in Antigravity paths
The Antigravity executor calls ShouldInject again after signature validation even though the conductor already evaluated it for the same outbound attempt, and req.Payload is reset to originalPayload before this second check. That means interval mode increments the global counter twice per Antigravity request; for example, with inject_interval: 2, the second check is always true so Antigravity requests are injected every time instead of every second request, and other interval values are similarly phase-shifted.
Useful? React with 👍 / 👎.
Summary
<temporal>tags into every outbound request to all providerssystemarray) vs OpenAI format (messagesarray) — no 400 errorsWhy This Matters
In long agent sessions, AI models hallucinate dates — wrong days, wrong weeks, wrong months. This cascades into incorrect tool calls, wrong file paths, and stale context decisions. By injecting fresh temporal context on every request, models stay grounded in the current date/time.
Pattern validated in
pi-rswhereTemporalDriftDetectorfound real drift in sessions >20 turns.Changes
internal/temporal/temporal.go— temporal tag builder + format-aware payload injection (~120 lines)sdk/cliproxy/auth/conductor.go— injects beforeExecute()andExecuteStream()Test plan
go build -o cliproxyapi ./cmd/server/— compilation succeedscurl -s -X POST localhost:8317/v1/chat/completions -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" -d '{"model":"claude-sonnet-4-6","messages":[{"role":"user","content":"what day is it"}],"max_tokens":100}'🤖 Generated with Claude Code